load balancer
A Meta-Heuristic Load Balancer for Cloud Computing Systems
Sliwko, Leszek, Getov, Vladimir
This is the accepted author's version of the paper. The final published version is available in the 2015 IEEE 39th Annual Com puter Software and Applications Conference, vol. Abstract -- This paper presents a strategy to allocate services on a Cloud system without overloading nodes and maintaining the system stability with minimum cost. We specify an abstract model of cloud resources utilization, including multiple types of resources as well as consideration s for the service migration costs. A prototype meta - heuristic load balancer is demonstrated and experiment al results are presented and discussed. We also propose a novel genetic algorithm, wher e population is seeded with the outputs of other meta - heuristic algorithms. Modern day applications are often designed in such a way that they can simultaneously use resources from different computer environments. System components are not just properties of individual machines and in many respects they can be viewed as though the y are deployed in a single application environment. Distributed computing differs from traditional computing in many ways.
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
Glia: A Human-Inspired AI for Automated Systems Design and Optimization
Hamadanian, Pouya, Karimi, Pantea, Nasr-Esfahany, Arash, Noorbakhsh, Kimia, Chandler, Joseph, ParandehGheibi, Ali, Alizadeh, Mohammad, Balakrishnan, Hari
Can an AI autonomously design mechanisms for computer systems on par with the creativity and reasoning of human experts? We present Glia, an AI architecture for networked systems design that uses large language models (LLMs) in a human-inspired, multi-agent workflow. Each agent specializes in reasoning, experimentation, and analysis, collaborating through an evaluation framework that grounds abstract reasoning in empirical feedback. Unlike prior ML-for-systems methods that optimize black-box policies, Glia generates interpretable designs and exposes its reasoning process. When applied to a distributed GPU cluster for LLM inference, it produces new algorithms for request routing, scheduling, and auto-scaling that perform at human-expert levels in significantly less time, while yielding novel insights into workload behavior. Our results suggest that by combining reasoning LLMs with structured experimentation, an AI can produce creative and understandable designs for complex systems problems.
- Asia > Middle East > Jordan (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- North America > United States > California > San Diego County > Carlsbad (0.04)
- Information Technology (0.46)
- Transportation (0.34)
Is the GPU Half-Empty or Half-Full? Practical Scheduling Techniques for LLMs
Kossmann, Ferdi, Fontaine, Bruce, Khudia, Daya, Cafarella, Michael, Madden, Samuel
Serving systems for Large Language Models (LLMs) improve throughput by processing several requests concurrently. However, multiplexing hardware resources between concurrent requests involves non-trivial scheduling decisions. Practical serving systems typically implement these decisions at two levels: First, a load balancer routes requests to different servers which each hold a replica of the LLM. Then, on each server, an engine-level scheduler decides when to run a request, or when to queue or preempt it. Improved scheduling policies may benefit a wide range of LLM deployments and can often be implemented as "drop-in replacements" to a system's current policy. In this work, we survey scheduling techniques from the literature and from practical serving systems. We find that schedulers from the literature often achieve good performance but introduce significant complexity. In contrast, schedulers in practical deployments often leave easy performance gains on the table but are easy to implement, deploy and configure. This finding motivates us to introduce two new scheduling techniques, which are both easy to implement, and outperform current techniques on production workload traces.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > San Diego County > Carlsbad (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Running a Stable Diffusion Cluster on GCP with tensorflow-serving (Part 2)
In part 1, we learned how to use terraform to set up and manage our infrastructure conveniently. In this part, we will continue on our journey to deploy a running Stable Diffusion model on the provisioned cluster. Note: You can follow this tutorial end-to-end even if you're a free user (as long as you have some of free tier credits left). Let's take a look at what the final result would be. If you add a bit of noise to an image gradually for many steps, you will end up with an image containing noise.
Creating a Machine Learning App using FastAPI and Deploying it Using Kubernetes
FastAPI is a new Python-based web framework used to create Web APIs. FastAPI is fast when serving your application, also enhances the performance of our application. Note: for you to follow along easily, use Google Colab. It's an easy-to-use platform to get started quickly while building models. We will build a machine learning model that will predict the nationality of individuals using their names. This is a simple model that will explain the key concepts used in machine learning modeling. The dataset used will contains common names of people and their nationalities. Pandas is a software library written for the Python programming language for data manipulation and analysis.
- Information Technology > Software > Programming Languages (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.75)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Reinforced Workload Distribution Fairness
Yao, Zhiyuan, Ding, Zihan, Clausen, Thomas Heide
Network load balancers are central components in data centers, that distributes workloads across multiple servers and thereby contribute to offering scalable services. However, when load balancers operate in dynamic environments with limited monitoring of application server loads, they rely on heuristic algorithms that require manual configurations for fairness and performance. To alleviate that, this paper proposes a distributed asynchronous reinforcement learning mechanism to - with no active load balancer state monitoring and limited network observations - improve the fairness of the workload distribution achieved by a load balancer. The performance of proposed mechanism is evaluated and compared with stateof-the-art load balancing algorithms in a simulator, under configurations with progressively increasing complexities. Preliminary results show promise in RLbased load balancing algorithms, and identify additional challenges and future research directions, including reward function design and model scalability.
- Europe > France (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > New Jersey (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
Towards Intelligent Load Balancing in Data Centers
Yao, Zhiyuan, Desmouceaux, Yoann, Townsley, Mark, Clausen, Thomas Heide
Network load balancers are important components in data centers to provide scalable services. Workload distribution algorithms are based on heuristics, e.g., Equal-Cost Multi-Path (ECMP), Weighted-Cost Multi-Path (WCMP) or naive machine learning (ML) algorithms, e.g., ridge regression. Advanced ML-based approaches help achieve performance gain in different networking and system problems. However, it is challenging to apply ML algorithms on networking problems in real-life systems. It requires domain knowledge to collect features from low-latency, high-throughput, and scalable networking systems, which are dynamic and heterogenous. This paper proposes Aquarius to bridge the gap between ML and networking systems and demonstrates its usage in the context of network load balancers. This paper demonstrates its ability of conducting both offline data analysis and online model deployment in realistic systems. The results show that the ML model trained and deployed using Aquarius improves load balancing performance yet they also reveals more challenges to be resolved to apply ML for networking systems.
- Oceania > Australia > New South Wales > Sydney (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Information Technology > Services (0.85)
- Energy > Power Industry (0.62)
How to run machine learning at scale -- without going broke
Machine learning is computationally expensive -- and because serving real-time predictions means running your ML models in the cloud, that computational expense translates into real dollars. Put another way, if you wanted to add a translation feature to your app that automatically translated text to your user's preferred language, you would deploy an NLP model as a web API for your app to consume. To host this API, you would need to deploy it through a cloud provider like AWS, put it behind a load balancer, and implement some kind of autoscaling functionality (probably involving Docker and Kubernetes). None of the above is free, and if you're dealing with a large amount of traffic, the total cost can get out of hand. This is especially true if you aren't optimizing your spend.
End to End Machine Learning: From Data Collection to Deployment
This started out as a challenge. With a friend of mine, we wanted to see if it was possible to build something from scratch and push it to production. In this post, we'll go through the necessary steps to build and deploy a machine learning application. This starts from data collection to deployment and the journey, as you'll see it, is exciting and fun . Before we begin, let's have a look at the app we'll be building: As you see, this web app allows a user to evaluate random brands by writing reviews. While writing, the user will see the sentiment score of his input updating in real-time along with a proposed rating from 1 to 5. The user can then change the rating in case the suggested one does not reflect his views, and submit. You can think of this as a crowd sourcing app of brand reviews with a sentiment analysis model that suggests ratings that the user can tweak and adapt afterwards. To build this application we'll follow these steps: All the code is available in our github repository and organized in independant directories, so you can check it, run it and improve it. Disclaimer: The scripts below are meant for educational purposes only: scrape responsibly. In order to train a sentiment classifier, we need data. We can sure download open source datasets for sentiment analysis tasks such as Amazon Polarity or IMDB movie reviews but for the purpose of this tutorial, we'll build our own dataset.
- Workflow (0.46)
- Instructional Material (0.34)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.89)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.89)
A load balancer that learns, WebTorch – UnifyID – Medium
In my previous blog post "How I stopped worrying and embraced docker microservices" I talked about why Microservices are the bees knees for scaling Machine Learning in production. A fair amount of time has passed (almost a year ago, whoa) and it proved that building Deep Learning pipelines in production is a more complex, multi-aspect problem. Yes, microservices are an amazing tool, both for software reuse, distributed systems design, quick failure and recovery, yada yada. But what seems very obvious now, is that Machine Learning services are very stateful, and statefulness is a problem for horizontal scaling. An easy way to deal with this issue is understand that ML models are large, and thus should not be context switched.